[QNN-EP] Implement file mapped weights feature by quic-calvnguy · Pull Request #26952 · microsoft/onnxruntime

quic-calvnguy · 2026-01-09T02:13:54Z

Description
Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices

Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward.

yuslepukhin · 2026-01-13T19:00:25Z

The observation is not entirely true. ORT memory map external weights. You have an ability to request a weight as an ORT Value from the EP. If the weight is external it will be memory mapped.

See Graph::LoadExternalInitializerAsOrtValue. One can add this to the provider_wrapped_types.h to Graph and expose to DLL based EPs. Then you will get mapping for free.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h

yuslepukhin · 2026-01-14T18:55:59Z

I suggest not to implement QNN specific mapping, but re-use code in ORT.

yuslepukhin · 2026-01-14T20:36:42Z

Discussed offline, the EP maps initializers form the binary context, not from the external weights files.

- Create file mapping callback interface class - Android expected to have support in the future - Implement Windows callbacks in WindowsFileMapper - New option disable_file_mapped_weights - Feature is enabled by default with retry logic

onnxruntime/core/providers/qnn/qnn_execution_provider.cc

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

yuslepukhin · 2026-01-15T00:27:39Z

Please, avoid force pushes.

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

yuslepukhin

🕐

edgchen1 · 2026-01-15T00:59:34Z

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-01-15T00:59:46Z

Azure Pipelines successfully started running 2 pipeline(s).

yuslepukhin · 2026-01-15T18:05:34Z

Please, comment on all Copilot review issues before resolving them.

onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

…ile_mapped_weights

edgchen1 · 2026-01-24T00:21:45Z

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-01-24T00:21:57Z

Azure Pipelines successfully started running 2 pipeline(s).

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

QnnBackendManager::SetupBackend if file mapping is not available

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

onnxruntime/core/providers/qnn/builder/qnn_file_mapping_interface.h

edgchen1 · 2026-01-24T01:02:13Z

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-01-24T01:02:25Z

Azure Pipelines successfully started running 2 pipeline(s).

onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h

Make file mapping callbacks more thread safe Do not destruct file_mapper_ until session destruction

edgchen1 · 2026-01-26T17:59:35Z

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-01-26T17:59:48Z

Azure Pipelines successfully started running 2 pipeline(s).

yuslepukhin

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc

and unnecessary functions relating to file_mapped_weights_enabled_

edgchen1 · 2026-01-26T21:31:00Z

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-01-26T21:31:21Z

Azure Pipelines successfully started running 4 pipeline(s).

Description Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices Motivation and Context Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward. --------- Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>

| Commit | Commit Title | Author | | :--- | :--- | :--- | | `11dde2d9e` | [NV TensorRT RTX EP] Fix external tensorrt_plugins load path (#26814) | keshavv27 | | `080d96818` | Move model compatibility checks ahead of session initialization (#27037) | adrastogi | | `ec4f6bfa1` | [QNN EP] Fix error messages being logged as VERBOSE instead of ERROR (#24931) | Copilot | | `0432e7125` | perftest: support plugin eps for compile_ep_context (#27121) | Jaskaran Singh Nagi | | `727db0d3d` | Engine compatibility validity API implementation (#26774) | umangb-09 | | `27013522f` | Deprecate transformers model examples (#27156) | Jambay Kinley | | `f83d4d06e` | [QNN-EP] Implement file mapped weights feature (#26952) | quic-calvnguy | --------- Co-authored-by: keshavv27 <165012837+keshavv27@users.noreply.github.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: vraspar <51386888+vraspar@users.noreply.github.com> Co-authored-by: yuslepukhin <11303988+yuslepukhin@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Jaskaran Singh Nagi <jaskaran.singh.nagi@intel.com> Co-authored-by: umangb-09 <umangb@nvidia.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>

Description Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices Motivation and Context Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward. --------- Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>

yuslepukhin requested a review from Copilot January 13, 2026 18:58

Copilot started reviewing on behalf of yuslepukhin January 13, 2026 18:59 View session

Copilot AI reviewed Jan 13, 2026

View reviewed changes

quic_calvnguy added 3 commits January 14, 2026 13:56

[QNN-EP] Implement file mapped weights feature

ff2d8d5

- Create file mapping callback interface class - Android expected to have support in the future - Implement Windows callbacks in WindowsFileMapper - New option disable_file_mapped_weights - Feature is enabled by default with retry logic

Address PR comments

5a25c35

Remove unnecessary underscore

2e451ae

quic-calvnguy force-pushed the dev/calvnguy/file_mapped_weights branch from f55dc78 to 2e451ae Compare January 14, 2026 21:57